Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Adaptive partitioning and scheduling method of convolutional neural network inference model on heterogeneous platforms
Shaofa SHANG, Lin JIANG, Yuancheng LI, Yun ZHU
Journal of Computer Applications    2023, 43 (9): 2828-2835.   DOI: 10.11772/j.issn.1001-9081.2022081177
Abstract307)   HTML9)    PDF (3025KB)(127)       Save

Aiming at the problems of low hardware resource utilization and high latency of Convolutional Neural Network (CNN) when performing inference on heterogeneous platforms, a self-adaptive partitioning and scheduling method of CNN inference model was proposed. Firstly, the key operators of CNN were extracted by traversing the computational graph to complete the adaptive partition of the model, so as to enhance the flexibility of the scheduling strategy. Then, based on the performance measurement and the critical path-greedy search algorithm, according to the sub-model running characteristics on the CPU-GPU heterogeneous platform, the optimal running load was selected to improve the sub-model inference speed. Finally, the cross-device scheduling mechanism in TVM (Tensor Virtual Machine) was used to configure the dependencies and running loads of sub-models in order to achieve adaptive scheduling of model inference, and reduce the communication delay between devices. Experimental results show that on GPU and CPU, compared to the method optimized by TVM operator, the proposed method improves the inference speed by 5.88% to 19.05% and 45.45% to 311.46% with no loss of model inference accuracy.

Table and Figures | Reference | Related Articles | Metrics